Hydrology#
In the following sections, we use Python to demonstrate how to gather data for a specific watershed in the HYDAT database
Environment setup#
[1]:
import xarray as xr
import intake
import hvplot.xarray
Accessing the data#
We are now ready to access our catalog which uses Intake to organize all our datasets.
[2]:
catalog_url = 'https://raw.githubusercontent.com/hydrocloudservices/catalogs/main/catalogs/main.yaml'
cat=intake.open_catalog(catalog_url)
cat
main:
args:
path: https://raw.githubusercontent.com/hydrocloudservices/catalogs/main/catalogs/main.yaml
description: Master Data Catalog
driver: intake.catalog.local.YAMLFileCatalog
metadata: {}
Let’s open ERA5-Land reanalysis (reference dataset) from which we will gather data at our watershed of interest
[3]:
ds_era5l = cat.atmosphere.era5_land_reanalysis.to_dask()
ds_era5l
[3]:
<xarray.Dataset>
Dimensions: (latitude: 701, longitude: 1171, time: 636240)
Coordinates:
* latitude (latitude) float64 85.0 84.9 84.8 84.7 ... 15.3 15.2 15.1 15.0
* longitude (longitude) float64 -167.0 -166.9 -166.8 ... -50.2 -50.1 -50.0
* time (time) datetime64[ns] 1950-01-01 ... 2022-07-31T23:00:00
Data variables:
sd (time, latitude, longitude) float32 dask.array<chunksize=(8760, 7, 7), meta=np.ndarray>
t2m (time, latitude, longitude) float32 dask.array<chunksize=(8760, 7, 7), meta=np.ndarray>
tp (time, latitude, longitude) float32 dask.array<chunksize=(8760, 7, 7), meta=np.ndarray>Here, we open HYDAT dataset as a xarray Dataset :
[4]:
ds_hydat = cat.hydrology.hydat.to_dask()
ds_hydat
[4]:
<xarray.Dataset>
Dimensions: (data_type: 2, id: 7881, latitude: 2800, longitude: 4680,
spatial_agg: 2, time: 59413, time_agg: 1, timestep: 1)
Coordinates: (12/13)
* data_type (data_type) <U5 'flow' 'level'
drainage_area (id) float64 dask.array<chunksize=(10,), meta=np.ndarray>
* id (id) <U7 '01AA002' '01AD001' ... '11AF004' '11AF005'
* latitude (latitude) float64 85.0 84.97 84.95 ... 15.07 15.05 15.02
* longitude (longitude) float64 -167.0 -167.0 -166.9 ... -50.05 -50.02
name (id) object dask.array<chunksize=(7881,), meta=np.ndarray>
... ...
regulated (id) float64 dask.array<chunksize=(10,), meta=np.ndarray>
source (id) object dask.array<chunksize=(7881,), meta=np.ndarray>
* spatial_agg (spatial_agg) object 'point' 'watershed'
* time (time) datetime64[ns] 1860-01-01 1860-01-02 ... 2022-08-31
* time_agg (time_agg) <U4 'mean'
* timestep (timestep) <U3 'day'
Data variables:
mask (id, latitude, longitude) float64 dask.array<chunksize=(1, 500, 500), meta=np.ndarray>
value (id, time, data_type, spatial_agg, timestep, time_agg) float64 dask.array<chunksize=(10, 59413, 1, 1, 1, 1), meta=np.ndarray>In the HYDAT dataset, we provide the mask variable which represents rasterized watershed deleniation at a 0.025 deg spatial resolution for most ids (stations) where streaflow is recorded. For instance, we can vizualize the mask for the 01AA002 station :
[5]:
%%time
station_id = '01AA002'
mask = ds_hydat.sel(id=station_id).mask
mask \
.where(mask==1, drop=True) \
.hvplot(geo=True, tiles='ESRI', width=750, height=500, colorbar=False, alpha=0.8)
CPU times: user 15.1 s, sys: 760 ms, total: 15.9 s
Wall time: 55 s
[5]:
We then interpolate our mask to fit the reference dataset’s spatial resolution :
[6]:
%%time
mask = mask.interp_like(ds_era5l)
mask = mask.where(mask==1, drop=True)
mask \
.hvplot(geo=True, tiles='ESRI', width=750, height=500, colorbar=False, alpha=0.8)
CPU times: user 13 s, sys: 521 ms, total: 13.5 s
Wall time: 51.6 s
[6]:
We can now apply the mask to the reference dataset and average the data on the mask :
[7]:
%%time
da = ds_era5l.t2m.where(mask==1, drop=True)
da
CPU times: user 12.9 s, sys: 285 ms, total: 13.1 s
Wall time: 47.3 s
[7]:
<xarray.DataArray 't2m' (time: 636240, latitude: 3, longitude: 3)>
dask.array<where, shape=(636240, 3, 3), dtype=float32, chunksize=(8760, 2, 3), chunktype=numpy.ndarray>
Coordinates:
* latitude (latitude) float64 46.6 46.5 46.4
* longitude (longitude) float64 -70.3 -70.2 -70.1
* time (time) datetime64[ns] 1950-01-01 ... 2022-07-31T23:00:00
drainage_area float64 dask.array<chunksize=(), meta=np.ndarray>
id <U7 '01AA002'
name object dask.array<chunksize=(), meta=np.ndarray>
province object dask.array<chunksize=(), meta=np.ndarray>
regulated float64 dask.array<chunksize=(), meta=np.ndarray>
source object dask.array<chunksize=(), meta=np.ndarray>
Attributes: (12/31)
GRIB_NV: 0
GRIB_Nx: 1171
GRIB_Ny: 701
GRIB_cfName: unknown
GRIB_cfVarName: t2m
GRIB_dataType: fc
... ...
GRIB_typeOfLevel: surface
GRIB_units: K
coordinates: number time step surface latitu...
long_name: 2 metre temperature
standard_name: unknown
units: K[8]:
%%time
da \
.mean(['latitude','longitude']) \
.hvplot(color='blue')
CPU times: user 4.69 s, sys: 461 ms, total: 5.15 s
Wall time: 24.4 s
[8]: